Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics collect stress test #2247

Closed

Conversation

fraillt
Copy link
Contributor

@fraillt fraillt commented Oct 25, 2024

Changes

Added two tests (delta and cumulative temporality) to measuring how much metrics collection phase impact measurements throughput.
Implementation consists of running collection phase in the loop, while simultaneously testing measurements throughput.
The main concern isn't the speed of collection phase itself, but ensuring that Opentelemetry Metrics doesn't significantly contribute to p99 latency.
I didn't customize it for different metrics type, idea is to eventually land #2117, so that I could unify collection phase for other metrics #2145.

Additionally, significantly improved code reuse for stress crate.

Here are some results on my machine:

cargo run --release --package stress --bin metrics_collect -- delta
    Finished `release` profile [optimized] target(s) in 0.07s
     Running `target/release/metrics_collect delta`
     17.38 measurements/ms
     52.54 measurements/it
   3023.88 μs/it
   
cargo run --release --package stress --bin metrics_collect -- cumulative
    Finished `release` profile [optimized] target(s) in 0.07s
     Running `target/release/metrics_collect cumulative`
      9.14 measurements/ms
     24.35 measurements/it
   2665.30 μs/it

For curiosity I have also implemented #2117 and #2145 locally, and results are as follows:

cargo run --release --package stress --bin metrics_collect -- delta
    Finished `release` profile [optimized] target(s) in 0.07s
     Running `target/release/metrics_collect delta`
     56.38 measurements/ms
    154.24 measurements/it
   2735.73 μs/it
   
cargo run --release --package stress --bin metrics_collect -- cumulative
    Finished `release` profile [optimized] target(s) in 0.07s
     Running `target/release/metrics_collect cumulative`
    840.84 measurements/ms
   2077.64 measurements/it
   2470.90 μs/it   

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@fraillt fraillt requested a review from a team as a code owner October 25, 2024 05:31
Copy link

linux-foundation-easycla bot commented Oct 25, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: fraillt / name: Mindaugas Vinkelis (b4fffe6)
  • ✅ login: lalitb / name: Lalit Kumar Bhasin (7265639)

@fraillt fraillt force-pushed the metrics_collect_stress_test branch 2 times, most recently from 02be567 to 1fc6d80 Compare October 25, 2024 11:46
@utpilla
Copy link
Contributor

utpilla commented Oct 25, 2024

@fraillt Thanks for the continued interest in making metrics better! I have two suggestions around sending PRs that I would like you to consider. These are mainly directed towards making the PR review process easy and simple (which would in turn result in the getting PRs merged faster).

  1. Send targeted PRs aimed at solving one thing at a time. For example, there's a lot of things going on this PR. There's a new test being added as the PR title suggests, there is moving around of files, refactoring around reuse of logic across projects. Each of these different additions/enhancements could be added incrementally instead of doing it in a single shot.

  2. Avoid force pushing the commit. With force push, we have no way to know how many files changed since your last commit. We then have to review all the files again to ensure that the latest commit only made the changes that we were expecting. Retaining the individual commits makes it easy for reviewer.

@fraillt
Copy link
Contributor Author

fraillt commented Oct 29, 2024

@fraillt Thanks for the continued interest in making metrics better! I have two suggestions around sending PRs that I would like you to consider. These are mainly directed towards making the PR review process easy and simple (which would in turn result in the getting PRs merged faster).

1. Send targeted PRs aimed at solving one thing at a time. For example, there's a lot of things going on this PR. There's a new test being added as the PR title suggests, there is moving around of files, refactoring around reuse of logic across projects. Each of these different additions/enhancements could be added incrementally instead of doing it in a single shot.

2. Avoid force pushing the commit. With force push, we have no way to know how many files changed since your last commit. We then have to review all the files again to ensure that the latest commit only made the changes that we were expecting. Retaining the individual commits makes it easy for reviewer.

I'll try to be more careful regarding force-push in the future.
Regarding targeted PR, I see your point, that targeted PRs are easier to review, but sometimes they doesn't make sense in isolation. For instance I moved SharedReader from opentelemetry-sdk/benches/metric.rs to opentelemetry-sdk/src/testing/metrics/metric_reader.rs, in isolation that looks like a pointless effort, but I do need this functionality for metrics_collect.rs.

Basically what I'm trying to say, is that I'm afraid that if I create PR just for refactoring, it might look worthless on its own...

All changes that was done in this PR does relate to one file that brings new functionality stress/src/bin/metrics_collect.rs.

@utpilla
Copy link
Contributor

utpilla commented Oct 29, 2024

Basically what I'm trying to say, is that I'm afraid that if I create PR just for refactoring, it might look worthless on its own...

All changes that was done in this PR does relate to one file that brings new functionality stress/src/bin/metrics_collect.rs.

I think we could still have a simpler split up of PRs. Something like this:

  1. Send a PR that adds the new stress test. (along with the boiler plate code: SharedReader etc.) You don't have to add the reuse related changes at this point.
  2. Once that gets merged, you could send a refactoring PR to allow for reuse of SharedReader. This would showcase the motivation for reuse as well.

@fraillt fraillt force-pushed the metrics_collect_stress_test branch from 1fc6d80 to b4fffe6 Compare October 30, 2024 20:26
@fraillt
Copy link
Contributor Author

fraillt commented Oct 30, 2024

I force-pushed, because it basically delete everything I did, except for one file, so I guess it is ok :).
I'll not force-push, once someone reviews and I'll update my revision to address review comments.

barrier.wait();
let now = Instant::now();
let mut count = 0;
while is_collecting.load(Ordering::Acquire) {
Copy link
Contributor

@utpilla utpilla Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could have a much simpler and effective setup here. If we want to know whether running collect stops the world, it's better to spawn a thread that keeps calling reader.collect on a loop. And have other threads, record measurements simultaneously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that initially. The problem was that collection phase was not realistic as it had 0 measurements and basically held lock in a loop. So I decided to make it more realistic, by generating some measurements so collection wouldn't be empty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem was that collection phase was not realistic as it had 0 measurements and basically held lock in a loop

I don't follow. You could always record some measurements before you call collect. My concern is around the way this test is setup. The test is using some custom "iterations" which seems unnecessary. You could simply do this instead:

  1. Emit some measurements.
  2. Spawn a thread that runs collect in a loop.
  3. Spawn additional threads that record measurements.
  4. Calculate throughput for the measurements recorded in step 3.

If doing the above, leads to zero measurements being recorded, then we have our answer: Collect is going to "stop the world" as long as it runs.

As I mentioned in this comment, unless we plan to use a more efficient synchronization mechanism in ValueMap, this stress test would not be adding any value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, this test is adding value, as it is able to measure difference with different "collect" implementation.
Second, running collect in the loop doesn't work with delta temporality, because after first run there will be zero measurements and all other iterations will basically hold write lock in the loop. While in reality no one runs collect in loop and usually there are many measurements and its important that write lock is held as short as possible (which is not the case with current implementation, hence I was able to improve it and this test actually measures it).
I could probably have different tests for delta an cumulative temporality, but I decided to emulate realistic environment.

Copy link
Contributor

@utpilla utpilla Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While in reality no one runs collect in loop

In reality, no one runs the collect the way it's run in this stress test either. Collect would usually be run periodically in 10/30/60 seconds intervals.

First of all, this test is adding value, as it is able to measure difference with different "collect" implementation.

usually there are many measurements and its important that write lock is held as short as possible (which is not the case with current implementation, hence I was able to improve it and this test actually measures it).

This test claims to measure update throughput "while collect is running". The actual implementation of this test however relies on squeezing in some updates before collect runs. That's not the same as testing "while collect is running".

I would love to improve collect to take the write lock for the shortest possible time, but it's better tested using a benchmark.

Copy link
Contributor Author

@fraillt fraillt Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual implementation of this test however relies on squeezing in some updates before collect runs.

There's a lot of truth to this statement, but none the less it appears to measure something useful :)

temporality type of change before change (measurements/ms) after change (measurements/ms)
Cumulative changed write to read lock of hashmap 9 840
Delta reduce the amount of time write lock is held 17 56

Ideally I would like to "catch" changes to how attribute-set hashmap are locked with both: existing types of measurements (with existing attribute-set and new attribute-set combination).
Maybe we need to tests these things separately... I don't know, any ideas are welcome :)

let barrier = Barrier::new(num_threads + 1);
std::thread::scope(|s| {
// first create bunch of measurements,
// so that collection phase wouldn't be "empty"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done without spawning new threads. (before you spawn threads for running collect and recording measurements)

};
is_collecting.store(true, Ordering::Release);
barrier.wait();
reader.collect(&mut rm).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR description, you mentioned that you saw a major improvement when running this with the changes for #2145 locally. I'm curious, the code changes for #2145 are mostly related to reuse of collect code across instruments. How is that improving the perf numbers?

Throughput perf here should mostly depend on the thread synchronization mechanism used for collect and update. Currently, it's using a RwLock in ValueMap and the collect operation requires a write lock which would "stop the world" until collect has the write lock. Unless you change that, I wouldn't expect any significant improvement in throughput here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For delta temporality I was able to reduce the amount of time a write lock is held while collecting.
For cumulative temporality I simply changed to read lock (currently there is write lock, although cumulative temporality doesn't modify anything, and IMO locking individual measurements is sufficient and much better, then "stop the world until all measurements are read" approach). Although read lock is better, but measurements with new attribute-sets will still be locked. I see two more alternatives:

  • clone entire attribute-set hashmap (acquire read lock, clone hashmap, release lock, iterate individual measurements)
  • implement some sort of sharding (more throughtput at the cost of single measurement performance)

In any case, there are ways to improve it:) so I created this PR so we could actually measure different ideas.

Copy link
Contributor

@utpilla utpilla Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good ideas!

For cumulative temporality I simply changed to read lock (currently there is write lock, although cumulative temporality doesn't modify anything,

Great! In that case, we should be able test update throughput while collect runs in a loop for Cumulative and a use a simple benchmark for Delta. 🙂

Copy link
Member

@lalitb lalitb Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implement some sort of sharding (more throughtput at the cost of single measurement performance)

I did try sharding approach earlier in #1564 - to reduce the write-lock duration by locking only the specific shard of measurements during collection. However, I encountered some concurrency issues. It’s still an approach worth revisiting at some point.

Great! In that case, we should be able test update throughput while collect runs in a loop for Cumulative and a use a simple benchmark for Delta. 🙂

+1

Copy link

codecov bot commented Nov 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.3%. Comparing base (e1860c7) to head (7265639).
Report is 35 commits behind head on main.

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #2247   +/-   ##
=====================================
  Coverage   79.3%   79.3%           
=====================================
  Files        121     121           
  Lines      20968   20968           
=====================================
  Hits       16646   16646           
  Misses      4322    4322           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@fraillt
Copy link
Contributor Author

fraillt commented Nov 22, 2024

#2323 solves same problem in much more precise way.

@fraillt fraillt closed this Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants